Virtualized Data Storage Over Wide-Area Networks

ABSTRACT

Virtual storage arrays consolidate branch data storage at data centers connected via wide area networks. Virtual storage arrays appear to storage clients as local data storage; however, virtual storage arrays actually store data at the data center. The virtual storage arrays overcomes bandwidth and latency limitations of the wide area network by predicting and prefetching storage blocks, which are then cached at the branch location. Virtual storage arrays leverage an understanding of the semantics and structure of high-level data structures associated with storage blocks to predict which storage blocks are likely to be requested by a storage client in the near future. Virtual storage arrays determine the association between requested storage blocks and corresponding high-level data structure entities to predict additional high-level data structure entities that are likely to be accessed. From this, the virtual storage array identifies the additional storage blocks for prefetching.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent Application No. 61/162,463, entitled “Virtualized Data Storage Over Wide-Area Networks”, filed Mar. 23, 2009; and is related to U.S. patent application Ser. No. ______ [Attorney Docket Number R001410US], entitled “Virtualized Data Storage System Architecture”, filed ______; U.S. patent application Ser. No. ______ [Attorney Docket Number R001411US], entitled “Virtualized Data Storage Cache Management”, filed ______; and U.S. patent application Ser. No. ______ [Attorney Docket Number R001420US], entitled “Virtual Data Storage System Optimizations”, filed ______; all of which are incorporated by reference herein for all purposes.

BACKGROUND

The present invention relates generally to data storage systems, and systems and methods to improve storage efficiency, compactness, performance, reliability, and compatibility. In computing, a file system specifies an arrangement for storing, retrieving, and organizing data files or other types of data on data storage devices, such as hard disk devices. A file system may include functionality for maintaining the physical location or address of data on a data storage device and for providing access to data files from local or remote users or applications.

Typically, data storage for multiple users and applications in an enterprise is implemented using a file server attached to one or more client systems and application servers via a local area network (LAN). The file server allows users and applications to access data via file-based network protocols, such as NFS or SMB/CIFS.

Many physical storage devices, such as hard disk drives, are too small, too slow, and too unreliable for enterprise storage operations. As a result, many file servers are connected with large numbers of remote data storage devices, such as disk arrays, tape libraries, and optical drive jukeboxes, via a storage area network (SAN). A storage area network appears to file and application servers as one or more locally attached storage devices. Storage area networks use protocols such as iSCSI and Fibre Channel Protocol to communicate with storage clients. These storage area network protocols are based on reading and writing blocks of data to storage devices and typically operate below the level of the file system.

Large organizations, such as enterprises, are often geographically spread out over many separate locations, referred to as branches. For example, an enterprise may have offices or branches in New York, San Francisco, and India. Each branch location may include its own internal local area network for exchanging data within the branch. Additionally, the branches may be connected via a wide area network, such as the internet, for exchanging data between branches.

Typical branch LAN installations also required data storage for their local client systems and application servers. For example, a typical branch LAN installation may include a file server for storing data for the client systems and application services. In prior systems, this branch's data storage is located at the branch site and connected directly with the branch LAN. Thus, each branch requires its own file server and associated data storage devices.

Deploying and maintaining file servers and data storage at a number of different branches is expensive and inefficient. Organizations often require on-site personnel at each branch to configure and upgrade each branch's data storage, and to manage data backups and data retention. Additionally, organizations often purchase excess storage capacity for each branch to allow for upgrades and growing data storage requirements. Because branches are serviced infrequently, due to their numbers and geographic dispersion, organizations often deploy enough data storage at each branch to allow for months or years of storage growth. However, this excess storage capacity often sits unused for months or years until it is needed, unnecessarily driving up costs.

Previously, some types information technology infrastructure, such as application servers, from multiple branches has been consolidated to one or a small number of centralized data centers. These centralized data centers are connected with multiple branches via a wide area network, such as the internet. This consolidation of information technology infrastructure decreases costs and improves management efficiency. However, branch data storage is rarely consolidated at a remote data center, because the intervening WAN is slow and has high latency, making storage accesses unacceptably slow for client systems and application servers. Thus, organizations have previously been unable to consolidate data storage from multiple branches.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be described with reference to the drawings, in which:

FIGS. 1A-1B illustrates virtual storage array system according to an embodiment of the invention;

FIGS. 2A-2B illustrate a method of optimizing data reads in a virtual storage array system according to an embodiment of the invention;

FIG. 3 illustrates a method of optimizing data writes in a virtual storage array system according to an embodiment of the invention;

FIGS. 4A-4B illustrate data migration of virtual storage array system according to an embodiment of the invention;

FIG. 5 illustrates a method of creating data snapshots of a virtual storage array according to an embodiment of the invention;

FIG. 6 illustrates an example optimized data compression and deduplication using file-system or other storage format awareness according to an embodiment of the invention;

FIG. 7 illustrates an example virtual machine implementation of a virtual storage array interface according to an embodiment of the invention; and

FIG. 8 illustrates an example computer system capable of implementing a virtual storage array interface according to an embodiment of the invention.

SUMMARY

An embodiment of the invention uses virtual storage arrays to consolidate branch location-specific data storage at data centers connected with branch locations via wide area networks. The virtual storage array appears to a storage client as a local branch data storage; however, embodiments of the invention actually store the virtual storage array data at a data center connected with the branch location via a wide-area network. In embodiments of the invention, a branch storage client accesses the virtual storage array using storage block based protocols.

Embodiments of the invention overcome the bandwidth and latency limitations of the wide area network between branch locations and the data center by predicting storage blocks likely to be requested in the future by the branch storage client and prefetching and caching these predicted storage blocks at the branch location. When this prediction is successful, storage block requests from the branch storage client may be fulfilled in whole or in part from the branch location's storage block cache. As a result, the latency and bandwidth restrictions of the wide-area network are hidden from the storage client.

The branch location storage client uses storage block-based protocols to specify reads, writes, modifications, and/or deletions of storage blocks. However, servers and higher-level applications typically access data in terms of files in a structured file system, relational database, or other high-level data structure. Each entity in the high-level data structure, such as a file or directory, or database table, node, or row, may be spread out over multiple storage blocks at various non-contiguous locations in the storage device. Thus, prefetching storage blocks based solely on their locations in the storage device is unlikely to be effective in hiding wide-area network latency and bandwidth limits from storage clients.

An embodiment of the invention leverages an understanding of the semantics and structure of the high-level data structures associated with the storage blocks to predict which storage blocks are likely to be requested by a storage client in the near future. To do this, an embodiment of the invention determines the association between requested storage blocks and the corresponding high-level data structure entities, such as files, directories, or database elements. Once this embodiment has identified one or more of the high-level data structure entities associated with a requested storage block, this embodiment of the invention identifies additional portions of the same or other high-level data structure entities that are likely to be accessed by the storage client. This embodiment of the invention then identifies the additional storage blocks corresponding to these additional high-level data structure entities. The additional storage blocks are then prefetched and cached at the branch location.

A further embodiment of the invention also hides wide-area network latency and bandwidth limits from storage clients during write operations by caching write requests from storage clients at their associated branch locations. Once a write request is cached at the branch location, the write request is acknowledged as complete to the storage client, allowing the storage client to continue operations. The cached write requests are then transferred from the branch location to the data center independently of the storage clients' operations. Once a new or updated storage block is stored in the storage block cache, it may be accessed by storage clients prior to its transfer to the data storage.

An additional embodiment allows for snapshots of the virtual storage array data. In this embodiment, a snapshot is prepared by setting the virtual storage array interface to a quiescent state and identifying any new or updated storage blocks in the branch location's storage block cache. The virtual storage array interface may then be set to an active state and resume normal operations. Upon receiving a snapshot request, the virtual storage array interface transfers these identified storage blocks to the data center, if they have not already been transferred as part of the normal write request process. If a storage client tries to modify a storage block previously identified as new or updated prior to a snapshot request, an embodiment of the branch virtual storage array interface makes a copy of this storage block. The modification is then applied to the copy of this storage block. Storage clients accessing this storage block will receive the modified copy of this storage block. However, the unmodified version of the storage block may be used to fulfill a subsequent snapshot request.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

An embodiment of the invention uses virtual storage arrays to consolidate branch location-specific data storage at data centers connected with branch locations via wide area networks. The virtual storage array appears to a storage client as a local branch data storage; however, embodiments of the invention actually store the virtual storage array data at a data center connected with the branch location via a wide-area network. In embodiments of the invention, a branch storage client accesses the virtual storage array using storage block based protocols.

Embodiments of the invention overcome the bandwidth and latency limitations of the wide area network between branch locations and the data center by predicting storage blocks likely to be requested in the future by the branch storage client and prefetching and caching these predicted storage blocks at the branch location. When this prediction is successful, storage block requests from the branch storage client may be fulfilled in whole or in part from the branch location' storage block cache. As a result, the latency and bandwidth restrictions of the wide-area network are hidden from the storage client.

The branch location storage client uses storage block-based protocols to specify reads, writes, modifications, and/or deletions of storage blocks. However, servers and higher-level applications typically access data in terms of files in a structured file system, relational database, or other high-level data structure. Each entity in the high-level data structure, such as a file or directory, or database table, node, or row, may be spread out over multiple storage blocks at various non-contiguous locations in the storage device. Thus, prefetching storage blocks based solely on their locations in the storage device is unlikely to be effective in hiding wide-area network latency and bandwidth limits from storage clients.

An embodiment of the invention leverages an understanding of the semantics and structure of the high-level data structures associated with the storage blocks to predict which storage blocks are likely to be requested by a storage client in the near future. To do this, an embodiment of the invention determines the association between requested storage blocks and the corresponding high-level data structure entities, such as files, directories, or database elements. Once this embodiment has identified one or more of the high-level data structure entities associated with a requested storage block, this embodiment of the invention identifies additional portions of the same or other high-level data structure entities that are likely to be accessed by the storage client. This embodiment of the invention then identifies the additional storage blocks corresponding to these additional high-level data structure entities. The additional storage blocks are then prefetched and cached at the branch location.

Another embodiment of the invention analyzes a selected high-level data structure entity to identify portions of the same or other high-level data structure entities that is likely to be accessed by the storage client. This embodiment of the invention then identifies the additional storage blocks corresponding to these additional high-level data structure entities. The additional storage blocks are then prefetched and cached at the branch location. This embodiment of the invention may also identify additional high-level data structure entities to analyze based on its analysis of previously selected high-level data structure entities.

Further embodiments of the invention may identify corresponding high-level data structure entities directly from requests for storage blocks.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

FIGS. 1A-1B illustrates virtual storage array systems according to an embodiment of the invention. FIG. 1A illustrates an example system 100 including virtual storage arrays according to an embodiment of an invention. The example system 100 includes two branches 105 a and 105 b, each of which has its own internal local area network (LAN), and a data center 110, which also includes its own LAN. The two branch networks 105 and the data center network 110 are connected by one or more wide area networks (WANs) 115, such as the internet. Although FIG. 1A shows two branches and one data center, embodiments of the invention can be implemented with any arbitrary number of branches and data centers.

Each of the branch LANs 105 may include routers, switches, and other wired or wireless network devices 107 for connecting with client systems and other devices, such as network devices 107 a and 107 b. For example, each of the branch LANs 105 may connect one or more client systems 108, such as client system 108 a and 108 b, with one or more application servers 109, such as 109 a and 109 b. Application servers 109 provide applications and application functionality to the client systems 108.

Previously, typical branch LAN installations also requires data storage for client systems and application servers. For example, a prior typical branch LAN installation may include a file server for storing data for the client systems and application servers, such as database servers and e-mail servers. In prior systems, this branch's data storage is located at the branch site and connected directly with the branch LAN. The branch data storage previously could not be located at the data center, because the intervening WAN is too slow and has high latency, making storage accesses unacceptably slow for client systems and application servers.

An embodiment of the invention allows for storage consolidation of branch-specific data storage at data centers connected with branches via wide area networks. This embodiment of the invention overcomes the bandwidth and latency limitations of the wide area network between branches and the data center. To this end, an embodiment of the invention includes virtual storage arrays.

A virtual storage array appears to branch users, such as branch client systems and branch application servers, as a storage array connected with the branch's local area network. A virtual storage array can be used for the same purposes as a local storage area network or other data storage device. For example, a virtual storage array may be used in conjunction with a file server for general-purpose data storage, in conjunction with a database server for database application storage, or in conjunction with an e-mail server for e-mail storage. However, the virtual storage array stores its data at a data center connected with the branch via a wide area network. Multiple separate virtual storage arrays, from different branches, may store their data in the same data center and, as described below, on the same storage devices.

Because the data storage of multiple branches is consolidated at a data center, the efficiency, reliability, cost-effectiveness, and performance of data storage is improved. An organization can manage and control access to their data storage at a central data center, rather than at large numbers of separate branches. This increases the reliability and performance of an organization's data storage. This also reduces the personnel required at branch offices to provision, maintain, and backup data storage. It also enables organizations to implement more effective backup systems, data snapshots, and disaster recovery for their data storage. Furthermore, organizations can plan for storage growth more efficiently, by consolidating their storage expansion for multiple branches and reducing the amount of excess unused storage. Additionally, an organization can apply optimizations such as compression or data deduplication over the data from multiple branches stored at the data center, reducing the total amount of storage required by the organization.

In an embodiment, virtual storage arrays are implemented at each of the branches 105 using branch virtual storage array interfaces 120, such as branch virtual storage array interfaces 120 a and 120 b. Any of the branch virtual storage array interfaces 120 may be a stand-alone computer system or network appliance or built into other computer systems or network equipment as hardware and/or software. In a further embodiment, any of the branch virtual storage array interfaces 120 may be implemented as a software application or other executable code running on a client system or application server.

In an embodiment, each of the branch virtual storage array interfaces 120 includes one or more storage array network interfaces and supports one or more storage array network protocols to connect with client systems and/or application servers within a branch local area network. Examples of storage array network interfaces suitable for use with embodiments of the invention include Ethernet, Fibre Channel, IP, and InfiniBand interfaces. Examples of storage array network protocols include ATA, Fibre Channel Protocol, and SCSI. Various combinations of storage array network interfaces and protocols are suitable for use with embodiments of the invention, including iSCSI, HyperSCSI, Fibre Channel over Ethernet, and iFCP. In cases where the storage array network interface uses Ethernet, an embodiment of the branch virtual storage array interface 120 can use the branch LAN's physical connections and networking equipment for communicating with client systems and application services. In other embodiments, separate connections and networking equipment, such as Fibre Channel networking equipment, is used to connect the branch virtual storage array interface 120 with client systems 108 and/or application servers 109.

In an embodiment, one or more of the branch LANs 105 can include a file server, for example built into one of the application servers 109, for providing a network file interface to the virtual storage array to client systems 108 and other application servers 109. In a further embodiment, the branch virtual storage array interface 120 is integrated as hardware and/or software with an application server 109, such as a file server, database server, or e-mail server. In this embodiment, the branch virtual storage array interface 120 can include application server interfaces, such as a network file interface, for interfacing with other application servers and/or client systems.

From the view of application servers 109 and client systems 108, a branch virtual storage array interface 120 appears to be a local storage array, having its data storage at the associated branch 105. For example, branch virtual storage array 120 a appears to clients 108 a and application server 109 a as a local data storage array on branch LAN 105 a. However, the branch virtual storage array interfaces 120 actually store and retrieve data from storage devices located on the data center LAN 110. Because virtual storage array data accesses must travel via the WAN 115 between the data center LAN 110 to the branch LANs 105, the virtual storage arrays are subject to the latency and bandwidth restrictions of the WAN 115.

In an embodiment, the branch virtual storage array interfaces 120 includes virtual storage array caches 122, such as virtual storage array caches 122 a and 122 b for virtual storage array interfaces 120 a and 120 b respectively, which are used to ameliorate the effects of the WAN 115 on virtual storage array performance. As described in detail below, virtual storage array data accesses, including data reads and data writes, can be optimized to minimize the effect of WAN bandwidth restrictions and latency.

Additionally, an embodiment of the invention includes a data center virtual storage array interface 125 located on the data center LAN 110. In an embodiment, the data center virtual storage array interface 125 communicates with one or more branch virtual storage interfaces 120 via the data center LAN 110, the WAN 115, and their respective branch LANs 105. Data communications between virtual storage interfaces 120 and 125 can be in any form and/or protocol used for carrying data over wired and wireless data communications networks, including TCP/IP.

The data center virtual storage array interface 125 translates data communications from branch virtual storage array interfaces 120 into storage accesses of a physical storage array network. To this end, an embodiment of a data center virtual storage array interface 125 accesses a physical storage array network interface 127, which in turn accesses physical data storage devices 129 on a storage array network. Examples of data storage devices 129 include physical data storage array devices 129 a and data backup devices 129 b. In another embodiment, the data center virtual storage array interface 125 includes one or more storage array network interfaces and supports one or more storage array network protocols for directly connecting with a physical storage array network and its data storage devices 129. Examples of storage array network interfaces suitable for use with embodiments of the invention include Ethernet, Fibre Channel, IP, and InfiniBand interfaces. Examples of storage array network protocols include ATA, Fibre Channel Protocol, and SCSI. Various combinations of storage array network interfaces and protocols are suitable for use with embodiments of the invention, including iSCSI, HyperSCSI, Fibre Channel over Ethernet, and iFCP. Embodiments of the data center virtual storage array interface 125 may connect with the physical storage array interface 127 and/or directly with the physical storage array network using the Ethernet network of the data center LAN and/or separate data communications connections, such as a Fibre Channel network.

In a further embodiment, branch 105 and data center LANs 110 may optionally include network optimizers 130 for improving the performance of data communications over the WAN 115 between branches and/or the data center. Network optimizers 130 can improve actual and perceived WAN network performance using techniques including compressing data communications; anticipating and prefetching data; caching frequently accessed data; shaping and restricting network traffic; and optimizing usage of network protocols. In an embodiment, network optimizers 130 may be used in conjunction with virtual storage array interfaces 120 and 125 to further improve virtual storage array performance accessing data via the WAN 115. In other embodiments, network optimizers 130 may ignore or pass-through virtual storage array data traffic, relying on the virtual storage array interfaces 120 and 125 on the branch 105 and data center LANs 110 to optimize WAN performance.

Further embodiments of the invention may be used in different network architectures. For example, a data center virtual storage array interface may be connected directly between a WAN and a physical data storage array, eliminating the need for a data center LAN. Similarly, a branch virtual storage array interface, implemented for example in the form of a software application executed by a storage client computer system, may be connected directly with a WAN, such as the internet, eliminating the need for a branch LAN.

FIG. 1B illustrates an example arrangement 150 of data within virtual and physical storage array networks according to an embodiment of the invention. In this example 150, two branches 155 a and 155 b each include a branch virtual storage array interface 160 a and 160 b and associated virtual storage array cache 165 a and 165 b, respectively. As discussed in detail below, each of the virtual storage array caches 165 are used to store prefetched virtual storage array network data and pending virtual storage array write data for their branch's respective virtual storage arrays.

In an embodiment, each of the branches 155 includes its own separate virtual storage array, which appears to be located within its branch LAN 155. However, the majority of the data storage of a branch's virtual storage array is located within the data center LAN 170 on one or more physical data storage devices 175. The data center LAN 170 is connected with the branch LANs 155 via WAN 185 In an embodiment, each branch's virtual storage array data is stored within a physical storage area network at the data center LAN 170. The physical storage area network may store virtual storage array data 180 for two or more branches. For example, physical data storage array 175 stores virtual storage array data 180 a and 180 b, which correspond with the data of the virtual storage arrays for branch 155 a and 155 b, respectively.

In a further embodiment, data optimizations such as data compression and data deduplication can be applied to each branch's virtual storage array data 180 separately or may be consolidated over multiple branches' virtual storage array data 180. For example, redundant data within a single branch's virtual storage array data within the data center's physical storage array network can be compressed or deduplicated to reduce storage requirements. In another further example, if two or more branches' virtual storage arrays include the same or similar data, compression or data deduplication can be applied over all of these virtual storage arrays, such that only a single copy of the redundant data needs to be stored in the physical storage area network. In this example, each of the separate branch virtual storage arrays will reference this single copy of the redundant data. For example, branch's 155 a virtual storage array data 180 a can be compressed or deduplicated together with branch's 155 b virtual storage array data 180 b so that there is only a single copy of any redundant data found in both virtual storage arrays.

In another embodiment, the virtual storage array can be used to provide “cloud” storage for network-based applications.

An embodiment of the invention prefetches virtual storage array data to improve data read performance of the virtual storage array. In an embodiment, the branch or data center virtual storage array interface analyzes read and write accesses to a branch's virtual storage array to predict which storage blocks may be accessed in the future. The branch or data center virtual storage array interface then retrieves some or all of these predicted storage blocks and stores them in the branch's virtual storage array cache. If storage client, such as an application server, file server, or client system, later requests access to one or more of the cached storage blocks, the branch virtual storage array interface retrieves the requested storage block from the virtual storage array cache, rather than retrieving the storage block from the physical storage devices located in the data center LAN via the WAN. This storage block prefetching hides the bandwidth and latency of the WAN from the storage client, making the virtual storage array appear as if it is a local storage device.

One complication with storage block prefetching is that sequential data within a file system or file is not necessarily stored as contiguous storage blocks within a storage area network. Similar complications occur when accessing databases or application data, such as e-mail data. This complication is illustrated by FIG. 2A. FIG. 2A illustrates an example 200 of a storage client 205 opening an example file “Foo.txt” and reading the first five file file system blocks or clusters of this file. These file protocol reads may be performed using any file system protocol, such as CIFS, NFS, or NTFS. This sequence of file protocol reads is received by a file server 210. The file server 210 translates these file protocol reads into one or more storage area network reads. Each storage area network read retrieves one or more storage blocks from the virtual or physical storage area network 215. The storage area network reads may use any storage area network protocol, such as iSCSI or other protocols discussed above. The sizes and boundaries of file system blocks and storage area network blocks are independent of each other; thus each file system block may correspond with a fraction of a storage area network block, a single storage area network block, or multiple storage area network blocks.

In this example, file system block 0 corresponds with storage area network blocks 101 and 200. File system block 1 corresponds with storage area network block 14. File system block 2 corresponds with storage area network block 25. File system block 3 corresponds with storage area network block 26. File system block 4 corresponds with storage area network block 12. As shown in this example, the first five file system blocks of a file in a file system correspond to six non-sequential storage area network blocks.

Typically, if a storage client requests the first five system blocks of a file, one optimization would be to prefetch and cache additional file blocks in this sequence, such as the next five file system blocks. However, because the storage area network blocks corresponding with this sequence of file blocks are not sequential, storage area network interfaces, which typically only receive requests for storage area network blocks, cannot accurately identify the storage area network blocks corresponding with a predicted sequence of file blocks.

FIG. 2B illustrates a method 250 of performing reactive prefetching of storage blocks according to an embodiment of the invention. Step 255 receives a storage block read request from a storage client, such as a client system or application server, at the branch location. In an embodiment, the storage block read request may be received by a branch location virtual data storage array interface. The storage block read request may be received using a storage area network protocol, such as iSCSI.

In response to the receipt of the storage block read request in step 255, decision block 260 determines if the requested storage block has been previously retrieved and stored in the storage block read cache at the branch location. If so, step 270 retrieves the requested storage block from the storage block read cache and returns it to the requesting storage client. In an embodiment, if the system includes a data center virtual storage array interface, then step 270 also forwards the storage block read request back to the data center virtual storage array interface for use in identifying additional storage blocks likely to be requested by the storage client in the future.

If the storage block read cache at the branch location does not include the requested storage block, step 265 retrieves the requested storage block via a WAN connection from the virtual storage array data located in a physical data storage at the data center. In an embodiment, a branch location virtual storage array interface forwards the storage block read request to the data center virtual storage array interface via the WAN connection. The data center virtual storage array interface then retrieves the requested storage block from the physical storage array and returns it to the branch location virtual storage array interface, which in turn provides this requested storage block to the storage client. In a further embodiment of step 265, a copy of the retrieved storage block may be stored in the storage block read cache for future accesses.

During and/or following the retrieval of the requested storage block from the virtual storage array or virtual storage array cache, steps 275 to 299 prefetch additional storage blocks likely to be requested by the storage client in the near future. Step 275 identifies a high-level data structure entity associated with the requested storage block. Examples of high-level data structure entities include file system entities such as files, directories, and file system blocks or clusters; and database structures such as database tables, rows, and nodes. Typical block storage protocols, such as iSCSI and FCP, specify block read requests using a storage block address or identifier. However, these storage block read requests do not include any identification of the associated high-level data structure entity, such as a specific file, directory, or database entity, that is associated with this storage block.

Therefore, an embodiment of step 275 identifies the high-level data structure entity corresponding with the requested storage block. In an embodiment of step 275, a branch or data center virtual storage array interface searches a file system data structure, such as an allocation table or tree, or a database data structure, such as a B-tree, to identify one or more high-level data structure entities corresponding with the requested storage block. In a further embodiment of step 275, a branch or data center virtual storage array interface preprocesses data structures to create other databases, tables, or other data structures adapted to facilitate searching for high-level data structure entities corresponding with storage blocks. These data structures mapping storage blocks to corresponding high-level data structure entities may be updated frequently or infrequently, depending upon the desired prefetching performance.

In a further embodiment, step 275 also determines a location or range of locations within the high-level data structure entity corresponding with the requested storage block. For example, a storage block may correspond with a specific range of addresses or offsets within a larger file.

Using the identification of the high-level data structure entity and optionally the location provided by step 275, step 280 identifies additional high-level data structure entities or portions thereof that are likely to be requested by the storage client. There are a number of different techniques for identifying addition high-level data structure entities or portions thereof for prefetching that may be used by embodiments of step 280. Some of these are described in detail in co-pending U.S. patent application Ser. No. ______[Attorney Docket Number R001420US], entitled “Virtual Data Storage System Optimizations”, filed ______, which is incorporated by reference herein for all purposes.

One example technique used by an embodiment of step 280 is to prefetch portions of the high-level data structure entity based on their adjacency or close proximity to the identified portion of the entity. For example, if step 275 determines that the requested storage block corresponds with a portion of a file from file offset 0 up to offset 4095, then step 280 may identify a second portion of this same file beginning with offset 4096 for prefetching. It should be noted that although these two portions are adjacent in the high-level data structure entity, their corresponding storage blocks may be non-contiguous.

Further embodiments of the invention may use other heuristics or other techniques to select predicted file system blocks, such as knowledge of application behavior associated with a file type. For example, application or protocol specific information may be used to identify storage blocks for prefetching and caching. For example, if the virtual storage array is used to store e-mail data, a branch or data center virtual storage array interface may identify an e-mail account or e-mail message ID associated with a requested storage block and then identify and prefetch storage blocks associated with the same user, with the same e-mail message ID, and/or with e-mail messages having nearby e-mail message IDs. This application or protocol specific information may be used alone or in conjunction with the above-described file system or database data.

Step 280 identifies all or portions of one or more high-level data structure entities for prefetching based on the high-level data structure entity associated with the requested storage block. However, as discussed above, storage clients specify data access requests in terms of storage blocks, not high-level data structure entities such as files, directories, or database entities. Thus, step 285 identifies one or more storage blocks corresponding with the high-level data structure entities identified for prefetching in step 280. In an embodiment, step 285 identifies additional storage blocks corresponding with the high-level data structure entities by accessing the data structures associated with a file system data structure, such as an allocation table or tree, or a database data structure, such as a B-tree, in a manner similar to a client system or application server requesting a high-level data structure entity. In another embodiment, step 280 accesses a separate data structure maintained by a virtual storage array interface to identify one or more storage blocks corresponding with the high-level data structure entities identified for prefetching.

Decision block 290 determines if any of the storage blocks identified in step 285 have already been stored in the storage block read cache located at the branch location. If not, step 295 retrieves these uncached additional storage blocks from the virtual storage array data located in a physical data storage on the data center LAN and sends them via a WAN connection to the appropriate branch LAN. Step 299 stores these additional storage blocks in the branch's virtual storage array cache for potential future access by storage clients within the branch LAN. In a further embodiment, decision block 290 and the determination of whether an additional storage block has been previously retrieved and cached may be omitted. Instead, this embodiment can send all of the identified additional storage blocks to the branch virtual storage array interface to be cached. The branch virtual storage array interface may then discard any redundant storage blocks. This embodiment can be used when WAN latency, rather than WAN bandwidth limitations, are an overriding concern.

Although the method 250 of FIG. 2B is described with respect to accessing files via the virtual storage array, embodiments of method 250 can also be applied to non-file based storage accesses. For example, an embodiment of method 250 can be applied to access databases via the virtual storage array. In this embodiment, portions of database tables or B-tree child nodes, rather than file system blocks, are used to identify corresponding storage blocks for prefetching and caching by a branch virtual storage array interface In another example, indirect blocks of a file system may be used to identify additional storage blocks to be prefetched and cached.

Following step 299, method 250 proceeds to step 255 to await receipt of further storage block requests. The storage blocks added to the storage block read cache in previous iterations of method 250 may be available for fulfilling storage block read requests.

Method 250 may be performed by a branch virtual data storage array interface, by a data center virtual data storage array interface, or by both virtual data storage array interfaces working in concert. For example, steps 255 to 270 of method 250 may be performed by a branch location virtual storage array interface and steps 275 to 299 of method 250 may be performed by a data center virtual storage array interface. In another example, all of the steps of method 250 may be performed by a branch location virtual storage array interface.

Similarly, the virtual storage array cache can be used to hide latency and bandwidth limitations of the WAN during virtual storage array writes. FIG. 3 illustrates a method 300 of optimizing data writes in a virtual storage array system according to an embodiment of the invention.

An embodiment of method 300 starts with step 305 receiving a storage block write request from a storage client within the branch LAN. The storage block write request may be received by a branch virtual storage interface.

In response to the receipt of the storage block write request, decision block 310 determines if the virtual storage array cache is capable of accepting additional write requests or is full. In an embodiment, the virtual storage array cache may use some or all of its storage as a queue for pending virtual storage array operations.

If decision block 310 determines that the virtual storage array cache can accept an additional storage block write request, then, in an embodiment of method 300, step 315 stores the storage block write request, including the storage block data to be written, in the virtual storage array cache. In this embodiment of method 300, step 320 then sends a write acknowledgement to the storage client. Following the storage client's receipt of this write request, the storage client believes its storage block write request is complete and can continue to operation normally. However, in step 325, the virtual storage array interface will transfer the queued written storage block via the WAN to the physical storage array at the data center LAN. In an embodiment, step 325 may perform this transfer in the background and asynchronously with the operation of storage clients.

While a storage block write request is queued and waiting to be transferred to the data center, a storage client may wish to access this storage block for a read or write. In this situation, the virtual storage array interface intercepts the storage block access request. In the case of a storage block read, the virtual storage array interface provides the storage client with the queued storage block. In the case of a storage block write, the virtual storage array interface will update the queued storage block data and send a write acknowledgement to the storage client for this additional storage block access.

Conversely, if decision block 310 determines that the virtual storage array cache cannot accept an additional storage block write request, then step 330 immediately transfers the storage block via the WAN to the physical storage array at the data center LAN. Following completion of this transfer, step 335 receives a write acknowledgement from the data center virtual storage array interface or the physical data storage array itself. Step 340 then sends a write acknowledgement to the storage client, allowing the storage client to resume normal operation.

In a further embodiment, a virtual storage array interface may throttle storage block read and/or write requests from storage clients to prevent the virtual storage array cache from filling up under typical usage scenarios.

FIGS. 4A-4B illustrate data migration of virtual storage array system according to an embodiment of the invention. Because the data storage of a branch's virtual storage array is located at a data center, rather than at the branch location, migrating data from one branch to another branch is straightforward. For example, FIG. 4A illustrates a first branch virtual storage interface 405 at a first branch 410 that provides access to a virtual storage array 415 a, having its virtual storage array data 420 stored at a data center 425. To migrate this example virtual storage array 415 a to a second branch, the first branch virtual storage array interface is configured to deactivate the first branch's access to the virtual storage array. A second branch virtual storage array interface at the second branch is then configured to access the virtual storage array data at the data center, thus providing the second branch with access to the virtual storage array.

Continuing from the example of FIG. 4A, FIG. 4B illustrates an example of a second branch virtual storage interface 430 at a second branch 435 that provides access to a virtual storage array 415 b, having its virtual storage array data 420 stored at a data center 425. In this example, the first branch virtual storage array interface 405 at the first branch 410 has been configured to deactivate the first branch's access to the virtual storage array. As a result, the second branch 435 has exclusive access to the virtual storage array data 420 via virtual storage array 415 b.

In a further embodiment, upon deactivating the virtual storage array 415 a at a first branch 410, the first branch virtual storage interface is adapted to transfer any updated storage data in its virtual storage array cache, such as new or updated storage blocks associated with pending write operations, back to the virtual storage array data 420 in the physical data storage array 440. This ensures that the virtual storage array data 420 maintained at the data center 425 is up to date.

Moreover, because the virtual storage array data 420 does not change location when a virtual storage array 415 is migrated to a new location, virtual storage arrays can be migrated frequently. For example, if an organization has a first branch in New York and a second branch in India, a virtual storage array may be migrated between these offices every work day. Because of the time differences between these two locations, the virtual storage array enables a 24-hour work cycle. During business hours in the New York branch, the New York branch will be given access to the virtual storage array. At the same time, it is late at night in India; thus this branch does not require access to the virtual storage array. When business hours are over in New York, the New York branch virtual storage array interface deactivates its virtual storage array access and completes any remaining updates to the virtual storage array data at the data center. Then, the India branch virtual storage array interface can activate virtual storage array access for the India branch. This allows the India branch to access the virtual storage array while the New York branch is closed for the night. At the end of business hours in India, this process is reversed and the New York branch reconnects with the virtual storage array.

In some cases, there may be some storage clients in a branch operating past business hours. In an embodiment, a virtual storage array interface at the branch can connect with the virtual storage array interface that is currently connected with the virtual storage array data via the WAN to provide after-hours storage clients access to the virtual storage array. For example, in FIG. 4B, the virtual storage array data 420 is accessed by virtual storage array 415 b currently provided by virtual storage array interface 430 located at the second branch 435. If a client system 445 at the first branch 410 needs to access data in the virtual storage array 415 b, the client system 445 contacts the first virtual storage array interface 405. The first virtual storage array interface 405 then contacts the second virtual storage array interface 430 to access the virtual storage array 415 b.

In a further embodiment, one or more virtual machines executing virtual storage array applications, application servers, and/or other applications may migrate with a virtual storage array between two or more branches. In this embodiment, an application server, such as a database application or an e-mail server and its associated data storage, implemented using a virtual storage array, may move together between branches. Because the application server is implemented within a virtual machine, this migration between branches may be seamless from the perspective of the application server.

FIG. 5 illustrates a method 500 of creating data snapshots of a virtual storage array according to an embodiment of the invention. An embodiment of the method 500 begins in step 505 with the initiation of a virtual storage array checkpoint. A virtual storage array checkpoint may be initiated automatically by a branch virtual storage interface according to a schedule or based on criteria, such as the amount of data changed since the last checkpoint. In a further embodiment, a virtual storage array checkpoint may be initiated in response to a request for a virtual storage array snapshot from a system administrator or administration application.

To create a virtual storage array checkpoint, in an embodiment of the method 500, step 510 sets the branch virtual storage array interface to a quiescent state. This entails completing any pending operations with storage clients (though not necessarily background operations between the branch and data center virtual storage array interfaces). While in the quiescent state, the branch virtual storage interface will not accept any new storage operations from storage clients.

Once the branch virtual storage array interface is set to a quiescent state by step 510, in step 515, an embodiment of the branch virtual storage array interface identifies updated storage blocks in its associated virtual storage array cache. These updated storage blocks include data that has been created or updated by storage clients but have yet to be transferred via the WAN back to the data center LAN for storage in the physical data storage array.

Once all of the updated storage blocks have been identified, in step 515 an embodiment of the branch virtual storage array creates a checkpoint data structure. The checkpoint data structure specifies a time of checkpoint creation and the set of updated storage blocks at that moment of time. Following the creation of the checkpoint data structure, in an embodiment of the method 500, step 520 reactivates the branch's virtual storage array. Following step 520, the branch virtual storage array interface can resume servicing storage operations from storage clients. Additionally, the branch virtual storage array may resume transferring new or updated storage blocks via the WAN to the data center LAN for storage in the physical data storage array. In a further embodiment, the virtual storage array cache may maintain a copy of an updated storage block even after a copy is transferred back to the data center LAN for storage. This allows subsequent snapshots to be created based on this data.

In an embodiment, following the reactivation of the virtual storage array in step 520, the virtual storage array interface preserves the updated storage blocks specified by the checkpoint data structure from further changes. If a storage client attempts to update a storage block that is associated with a checkpoint, an embodiment of the virtual storage array interface creates a duplicate of this storage block in the virtual storage array cache to store the updated data. This preserves the data of this storage block at the time of the checkpoint for potential future reference.

Optionally, an embodiment of the method 500 may initiate one or more additional virtual storage array checkpoints at later times or in response to criteria or conditions. Embodiments of the virtual storage array interface may maintain any arbitrary number of checkpoint data structures and automatically delete outdated checkpoint data structures. For example, a branch virtual storage interface may maintain only the most recently created checkpoint data structure, or checkpoint data structures from the beginning of the most recent business day and the most recent hour.

At some point, a system administrator or administration application may request a snapshot of the virtual storage array data. A snapshot of the virtual storage array data represents the complete set of virtual storage array data at a specific moment of time. Step 525 receives a snapshot request from the a system administrator or administration application. In response to a snapshot request, in step 530, an embodiment of a branch virtual storage array interface transfers a copy of the appropriate checkpoint data structure to the data center virtual storage interface. Additionally, the branch virtual storage array interface transfers a copy of any updated storage blocks specified by this checkpoint data structure.

In an embodiment, the data center virtual storage array interface creates a snapshot of the data of the virtual storage array. The snapshot includes a copy of the all of the virtual storage array data in the physical data storage array unchanged from the time of creation of the checkpoint data structure. The snapshot also includes a copy of the updated storage blocks specified by the checkpoint data structure. An embodiment of the data center virtual storage array interface may store the snapshot in the physical storage array or using a data backup. In an embodiment, the data center virtual storage array interface automatically sends storage operations to the physical storage array interface to create a snapshot from a checkpoint data structure. These storage operations can be carried out in the background by the data center virtual storage array interface in addition to translating virtual storage array operations from one or more branch virtual storage array interfaces into corresponding physical storage array operations.

As described above, storage clients can interact with virtual storage arrays in the same manner that they would interact with physical storage arrays. This includes issuing storage commands to the branch virtual storage interface using storage array network protocols such as iSCSI or Fibre Channel protocol. Most storage array network protocols organize data according to storage blocks, each of which has a unique storage address or location. A storage block's unique storage address may include logical unit number (using the SCSI protocol) or other representation of a logical volume.

In an embodiment, the virtual storage arrays provided by branch virtual storage interfaces allow storage clients to access storage blocks by their unique storage address within the virtual storage array. However, because one or more virtual storage arrays actually store their data within a physical storage array, for example implemented as a physical storage area network, an embodiment of the invention allows arbitrary mappings between the unique storage addresses of storage blocks in the virtual storage array and the corresponding unique storage addresses in one or more physical storage arrays. In an embodiment, the mapping between virtual and physical storage address may be performed by a branch virtual storage array interface and/or by a data center virtual storage array interface. Furthermore, there may be multiple levels of mapping between a branch virtual storage array and the physical storage array.

In an embodiment, storage blocks in the virtual storage array may be of a different size and/or structure than the corresponding storage blocks in the physical storage array. For example, if data compression is applied to the storage data, then the physical storage array data blocks may be smaller than the storage blocks of the virtual storage array, to take advantage of data storage savings. In an embodiment, the branch and/or data center virtual storage array interfaces map one or more virtual storage array storage blocks to one or more physical storage array storage blocks. Thus, a virtual storage array storage block can correspond with a fraction of a physical storage array storage block, a single physical storage array storage block, or multiple physical storage array storage blocks, as required by the configuration of the virtual and physical storage arrays.

In a further embodiment, the branch and data center virtual storage array interfaces may reorder or regroup storage operations from storage clients to improve efficiency of data optimizations such as data compression. For example, if two storage clients are simultaneously accessing the same virtual storage array, then these storage operations will be intermixed when received by the branch virtual storage array interface. An embodiment of the branch and/or data center virtual storage array interface can reorder or regroup these storage operations according to storage client, type of storage operation, data or application type, or any other attribute or criteria to improve virtual storage array performance and efficiency. For example, a virtual storage array interface can group storage operations by storage client and apply data compression to each storage client's operations separately, which is likely to provide greater data compression than compressing all storage operations together. FIG. 6 illustrates an example 600 of optimized data compression and deduplication using file-system or other storage format awareness, such as database nodes, according to an embodiment of the invention. In the example 600, incoming requests for file system blocks or clusters are regrouped and reordered based on their associated file system file and their position within their respective files.

In an embodiment, unique storage labels can be assigned to storage blocks or groups of storage blocks in the virtual storage array cache. These unique storage labels can be determined arbitrarily or based on the data included in storage blocks, for example using hashes or hashes of hashes. Furthermore, hierarchical labels may be assigned to storage blocks. A hierarchical label is associated with a sequence of one or more additional labels. Each of these additional labels is associated with either a storage block or one or more additional labels. By assigning labels to storage blocks, WAN optimization techniques can be further applied to virtual storage array data traffic between the branch LAN and the data center LAN.

Embodiments of the invention can implement virtual storage array interfaces at the branch and/or data center as standalone devices or as part of other devices, computer systems, or applications. FIG. 7 illustrates an example virtual machine implementation 700 of a virtual storage array interface according to an embodiment of the invention. In this example virtual machine implementation 700, the virtual storage array interface 705 is implemented as a software application executed by a virtual machine 710. The virtual machine 710 is located in this example within a network optimizer device 715; however, other embodiments of this virtual machine implementation 700 can be located within other types of network devices, including switches, routers, and storage devices and interfaces.

In an embodiment, the virtual machine 710 implementing the virtual storage interface is optionally connected with an internal or external data storage device to act as a virtual storage array cache 720.

In an embodiment, the network optimizer 715 include LAN and WAN network connections 725 and 730 for intercepting network traffic. A virtual machine hardware and software interface 740 is connected with these network connections to allow the virtual machine to send and receive network communications. In this example, the network optimizer also includes a network optimization module 735 for performing WAN optimization techniques on network traffic passing between the LAN and the WAN network connections 725 and 730.

In a further embodiment, the network optimizer 715 or other host device may include multiple virtual machines for executing additional applications, application servers, and/or performing additional data processing functions. For example, a network optimizer device can include a first virtual machine for implementing a virtual storage array interface to a virtual storage array; a second virtual machine for implementing an application server, such as a database application; and a third virtual machine executing a data processing application, such as an anti-virus scanning application. In this example, the virtual machines can communicate with each other as well as with other entities connected via the local and wide area networks.

FIG. 8 illustrates an example computer system capable of implementing a virtual storage array interface according to an embodiment of the invention. FIG. 8 is a block diagram of a computer system 2000, such as a personal computer or other digital device, suitable for practicing an embodiment of the invention. Embodiments of computer system 2000 may include dedicated networking devices, such as wireless access points, network switches, hubs, routers, hardware firewalls, network traffic optimizers and accelerators, network attached storage devices, storage array network interfaces, and combinations thereof.

Computer system 2000 includes a central processing unit (CPU) 2005 for running software applications and optionally an operating system. CPU 2005 may be comprised of one or more processing cores. Memory 2010 stores applications and data for use by the CPU 2005. Examples of memory 2010 include dynamic and static random access memory. Storage 2015 provides non-volatile storage for applications and data and may include fixed or removable hard disk drives, flash memory devices, ROM memory, and CD-ROM, DVD-ROM, Blu-ray, HD-DVD, or other magnetic, optical, or solid state storage devices. In a further embodiment, CPU 2005 may execute virtual machine software applications to create one or more virtual processors capable of executing additional software applications and optional additional operating systems.

Optional user input devices 2020 communicate user inputs from one or more users to the computer system 2000, examples of which may include keyboards, mice, joysticks, digitizer tablets, touch pads, touch screens, still or video cameras, and/or microphones. In an embodiment, user input devices may be omitted and computer system 2000 may present a user interface to a user over a network, for example using a web page or network management protocol and network management software applications.

Computer system 2000 includes one or more network interfaces 2025 that allow computer system 2000 to communicate with other computer systems via an electronic communications network, and may include wired or wireless communication over local area networks and wide area networks such as the Internet. Computer system 2000 may support a variety of networking protocols at one or more levels of abstraction. For example, computer system may support networking protocols at one or more layers of the seven layer OSI network model. An embodiment of network interface 2025 includes one or more wireless network interfaces adapted to communicate with wireless clients and with other wireless networking devices using radio waves, for example using the 802.11 family of protocols, such as 802.11a, 802.11b, 802.11g, and 802.11n.

An embodiment of the computer system 2000 may also include a wired networking interface, such as one or more Ethernet connections to communicate with other networking devices via local or wide-area networks.

The components of computer system 2000, including CPU 2005, memory 2010, data storage 2015, user input devices 2020, and network interface 2025 are connected via one or more data buses 2060. Additionally, some or all of the components of computer system 2000, including CPU 2005, memory 2010, data storage 2015, user input devices 2020, and network interface 2025 may be integrated together into one or more integrated circuits or integrated circuit packages. Furthermore, some or all of the components of computer system 2000 may be implemented as application specific integrated circuits (ASICS) and/or programmable logic.

Further embodiments can be envisioned to one of ordinary skill in the art after reading the attached documents. For example, embodiments of the invention can be used with any number of network connections and may be added to any type of network device, client or server computer, or other computing device in addition to the computer illustrated above. In other embodiments, combinations or sub-combinations of the above disclosed invention can be advantageously made. The block diagrams of the architecture and flow charts are grouped for ease of understanding. However it should be understood that combinations of blocks, additions of new blocks, re-arrangement of blocks, and the like are contemplated in alternative embodiments of the present invention.

The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made thereunto without departing from the broader spirit and scope of the invention as set forth in the claims. 

1. A method of optimizing a block storage protocol read access to a block storage device via a wide area network, the method comprising: receiving a storage request specifying at least a first storage block from a storage client, wherein the storage client is connected with a wide area network at a first network location; identifying at least a first portion of a set of file system entities corresponding with the first storage block; identifying at least at a second portion of the set of file system entities likely to be associated with a future storage request based on the first portion of the set of file system entities; identifying at least a second storage block corresponding with the second portion of the set of file system entities; retrieving the second storage block from a data storage connected with the wide area network at a second network location; communicating via the wide area network the second storage block from the data storage to a storage block cache at the first network location; and storing the second storage block in the storage block cache.
 2. The method of claim 1, wherein the first portion of the set of file system entities and the second portion of the set of file system entities include a first one of the set of file system entities.
 3. The method of claim 1, wherein the first portion of the set of file system entities includes a first one of the set of file system entities and the second portion of the set of file system entities includes a second one of the set of file system entities.
 4. The method of claim 1, wherein the set of file system entities includes a file system entity.
 5. The method of claim 1, wherein the set of file system entities includes a directory.
 6. The method of claim 1, wherein the set of file system entities includes a file system data structure.
 7. The method of claim 1, wherein identifying at least the first portion of a set of file system entities corresponding with the first storage block comprises: accessing a storage structure database including mappings from storage block locations to portions of the set of file system entities.
 8. The method of claim 1, wherein identifying at least the second storage block corresponding with the second portion of the set of file system entities comprises: accessing a data storage structure including previously determined mappings from portions of the set of file system entities to storage block locations.
 9. The method of claim 1, comprising: receiving a second storage request from the storage client; determining if the second storage request includes a request for the second storage block; in response to the determination that the second storage request includes the request for the second storage block, retrieving the second storage block from the storage block cache at the first network location; and in response to the determination that the second storage request does not include the request for the second storage block, retrieving at least one additional storage block from the data storage connected with the wide area network at the second network location.
 10. A method of optimizing a block storage protocol read access to a block storage device via a wide area network, the method comprising: receiving a storage request specifying at least a first storage block from a storage client, wherein the storage client is connected with a wide area network at a first network location; identifying at least a first portion of a set of database entities corresponding with the first storage block; identifying at least at a second portion of the set of database entities likely to be associated with a future storage request based on the first portion of the set of database entities; identifying at least a second storage block corresponding with the second portion of the set of database entities; retrieving the second storage block from a data storage connected with the wide area network at a second network location; communicating via the wide area network the second storage block from the data storage to a storage block cache at the first network location; and storing the second storage block in the storage block cache.
 11. The method of claim 10, wherein the first portion of the set of database entities and the second portion of the set of database entities include a first one of the set of database entities.
 12. The method of claim 10, wherein the first portion of the set of database entities includes a first one of the set of database entities and the second portion of the set of database entities includes a second one of the set of database entities.
 13. The method of claim 10, wherein the set of database entities includes a table.
 14. The method of claim 10, wherein the set of database entities includes a database system node.
 15. The method of claim 10, wherein identifying at least the first portion of a set of database entities corresponding with the first storage block comprises: accessing a storage structure database including mappings from storage block locations to portions of the set of database entities.
 16. The method of claim 10, wherein identifying at least the second storage block corresponding with the second portion of the set of database entities comprises: accessing a data storage structure including previously determined mappings from portions of the set of database entities to storage block locations.
 17. The method of claim 10, comprising: receiving a second storage request from the storage client; determining if the second storage request includes a request for the second storage block; in response to the determination that the second storage request includes the request for the second storage block, retrieving the second storage block from the storage block cache at the first network location; and in response to the determination that the second storage request does not include the request for the second storage block, retrieving at least one additional storage block from the data storage connected with the wide area network at the second network location.
 18. A method of optimizing a block storage protocol write access to a block storage device via a wide area network, the method comprising: receiving a storage request specifying at least a first storage block from a storage client connected with a wide area network at a first network location; determining if a storage block cache has sufficient capacity to store at least the first storage block; and in response to the determination that the storage block cache has sufficient capacity to store at least the first storage block: storing the first storage block in the storage block cache; sending a storage request acknowledgement to the storage client indicating that the storage request is complete; and following the storage request acknowledgement, communicating the first storage block via the wide area network to a data storage connected with the wide area network at a second network location, wherein the data storage is adapted to store the first storage block.
 19. The method of claim 18, wherein the storage block cache is located at the first network location and is connected with the storage client via a first local network.
 20. The method of claim 18, further comprising: in response to the determination that the storage block cache does not have sufficient capacity to store at least the first storage block: communicating the first storage block via the wide area network to a data storage connected with the wide area network at a second network location; receiving a first storage request acknowledgement from the data storage, wherein the first storage request acknowledgment indicates that the data center has stored the first storage block; and following the receipt of the first storage request acknowledgement, sending a second storage request acknowledgement to the storage client indicating that the storage request is complete.
 21. A method of preserving data in a data storage device, the method comprising: setting a storage interface connected with a wide area network at a first network location to a quiescent state; identifying a first set of storage blocks in a storage block cache connected at the first network location that has changed since following its initial storage in the storage block cache; setting the storage interface to an active state; following the storage interface setting to the active state, transferring the first set of storage blocks via the wide area network to a second network location; and storing a data snapshot on a data storage at the second network location, wherein the snapshot includes the first set of storage blocks.
 22. The method of claim 21, wherein transferring is in response to a snapshot request received from an administration application.
 23. The method of claim 21, wherein the data snapshot includes a copy of a second set of storage blocks stored by the data storage, wherein the second set of storage blocks is unchanged since the time of that the storage interface is set to the quiescent state.
 24. The method of claim 23, wherein the second set of storage blocks was previously stored by the data storage prior to the storage interface being set to the quiescent state.
 25. The method of claim 21, comprising: receiving, prior to transferring the first set of storage blocks, a first modification to at least a portion of the first set of storage blocks; in response to receiving the first modification, creating a copy of at least the portion of the first set of storage blocks; applying the first modification to the copy of at least the portion of the first set of storage blocks; and preserving the unmodified portion of the first set of storage blocks for transfer to the second network location.
 26. The method of claim 25, comprising: receiving a storage request from a storage client at the first network location, wherein the storage request specifies at least the portion of the first set of storage blocks; in response to the storage request, providing the modified copy of the portion of the first set of storage blocks to the storage client. 